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Nucleic Acids, Proteins, and Antibodies 

This application refers to a "Sequence Listing" that is provided on electronic 
media in computer readable form pursuant to Administrative Instructions Section 801(a)(i) 
and as a paper copy. The Sequence Listing forms a part of this description pursuant to 
Rule 5.2 and Administrative Instructions Sections 801 to 806, and is hereby incorporated 
in its entirety. 

The Sequence Listing is provided as an electronic file (PA131PCTSL..txt, 
5,210,863 bytes in size, created on May 18, 2001) on three identical compact discs (CD- 
R), labeled "COPY 1," "COPY 2 " and "CRF." The Sequence Listing complies with 
Annex C of the Administrative Instructions, and may be viewed, for example, on an IBM- 
PC machine running the MS-Windows operating system by using the V viewer software, 
version 2000 (see World Wide Web URL: http://www.fileviewer.com). 

Field of the Invention 
[0001] The present invention relates to novel proteins. More specifically, isolated 
nucleic acid molecules are provided encoding novel polypeptides. Novel polypeptides and 
antibodies that bind to these polypeptides are provided. Also provided are vectors, host 
cells, and recombinant and synthetic methods for producing human polynucleotides and/or 
polypeptides, and antibodies. The invention further relates to diagnostic and therapeutic 
methods useful for diagnosing, treating, preventing and/or prognosing disorders related to 
these novel polypeptides. The invention further relates to screening methods for 
identifying agonists and antagonists of polynucleotides and polypeptides of the invention. 
The present invention further relates to methods and/or compositions for inhibiting or 
enhancing the production and function of the polypeptides of the present invention. 

Background of the Invention 

[0002] Protein transport is a quintessential process for both prokaryotic and eukaryotic 

cells. Transport of an individual protein usually occurs via an amino-terminal signal 

sequence, which directs, or targets, the protein from its ribosomal assembly site to a 

particular cellular or extracellular location. Transport may involve any combination of 

several of the following steps: contact with a chaperone, unfolding, interaction with a 

receptor and/or a pore complex, addition of energy, and refolding. Moreover, an 

1 


WO 01/90304 


PCTYUS01/16450 


extracellular protein may be produced as an inactive precursor. Once the precursor has 
been exported, removal of the signal sequence by a signal peptidase activates the protein. 
[0003] Although amino-terminal signal sequences vary substantially, many patterns 
and overall properties are shared. Recently, hidden Markov models (HMMs), statistical 
alternatives to FASTA and Smith Waterman algorithms, have been used to find shared 
patterns, specifically consensus sequences (Pearson, W.R. and D J. Lipman PNAS 
85:2444-48 (1988); Smith, T.F. and M.S. Waterman J. Mol. BioL 147:195-97 (1981)). 
Although they were initially developed to examine speech recognition patterns, HMMs 
have been used in biology to analyze protein and DNA sequences and to model protein 
structure (Krogh, A. et al. J. Mol. Biol. 235:1501-31 (1994); Collin, M. et al Protein Sci. 
2:305-14 (1993)). HMMs have a formal probabilistic basis and use position-specific 
scores for amino acids or nucleotides and for opening and extending an insertion or 
deletion. The algorithms are quite flexible in that they incorporate information from newly 
identified sequences to build even more successful patterns. Other methods exist to 
identify membrane associated proteins. Klein et al. have developed a method ("ALOM", 
also called as KKD) to detect potential transmembrane segments in polypeptides (Klein, 
P. et al. Biochim. Biophys. Acta, 815:468 (1985)). It attempts to identify the most 
probable transmembrane segment from the average hydrophobicity value over a range of 
amino acid residues* It predicts whether the segment is a transmembrane segment 
(INTEGRAL) or not (PERIPHERAL) and thus, can suggest membrane association of a 
polypeptide. 

[0004] Some examples of the protein families which are known to be plasma 
membrane associated are receptors (nuclear, 4 transmembrane, G protein coupled, and 
tyrosine kinase), cytokines (chemokines), hormones (growth and differentiation factors), 
neuropeptides and vasomediators, protein kinases, phosphatases, phospholipases, 
phosphodiesterases, nucleotide cyclases, matrix molecules (adhesion, cadherin, 
extracellular matrix molecules, integrin, and selectin), seven transmembrane receptors, ion 
channels (calcium, chloride, potassium, and sodium), proteases, transporter/pumps (amino 
acid, protein, sugar, metal and vitamin; calcium, phosphate, potassium, and sodium) and 
regulatory proteins. Descriptions of some of these proteins (seven transmembrane 
receptors, kinases, matrix proteins, fibronectins, defensins, EF-hand domain containing 
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proteins, mac/perforin family members, pancreatic hormones, serine carboxypeptidases, 
tumor necrosis factors (TNFs)) and diseases associated with their dysfunction follow. 

Seven transmembrane receptors- 

[0005] The seven transmembrane receptors (also known as heptahelical, serpentine, or 
G protein-coupled receptors) comprise a superfamily of structurally related molecules. 
Possible relationships among seven transmembrane receptors (7TM receptors) for which 
amino acid sequence had previously been reported are reviewed in Probst et al, DNA and 
Cell Biology, ll(l):l-20 (1992). Briefly, the 7TM receptors exhibit detectable amino acid 
sequence similarity and all appear to share a number of structural characteristics 
including: an extracellular amino terminus; seven predominantly hydrophobic a-helical 
domains (of about 20-30 amino acids) which are believed to span the cell membrane and 
are referred to as transmembrane domains TM 1-7; approximately twenty well-conserved 
amino acids; and a cytoplasmic carboxy terminus. 

[0006] Each 7TM receptor is predicted to associate with a particular G protein at the 
intracellular surface of the plasma membrane. The binding of the receptor to its ligand is 
thought to result in activation (i.e., the exchange of GTP for GDP on the a-subunit) of the 
G protein which in turn stimulates specific intracellular signal-transducing enzymes and 
channels. Thus, the function of each 7TM receptor is to discriminate its specific ligand 
from the complex extracellular milieu and then to activate G proteins to produce a specific 
intracellular signal. Transmembrane domain-3 (TM3) is believed to be essential in signal 
transduction (Cotecchia et al., Proa Natl Acad. Sci., USA, 87:2896-2900 (1990)). Other 
regions may be essential for biological activity as well (Lefkowitz, Nature, 265:603-604 
(1993)), 

[0007] Mutations in the third intracellular loop of one 7TM receptor (the thyrotropin 
receptor) and in the adjacent sixth transmembrane domain of another 7TM receptor (the 
luteinizing hormone receptor) have been reported to be the genetic defects responsible for 
an uncommon form of hyperthyroidism (Parma et al., Nature, 365:649-651 (1993) and for 
familial precocious puberty (Shenker et al., Nature, 365:652-654 (1993)), respectively. In 
both cases the mutations result in constitutive activation of the G protein teceptors. Other 
studies have shown that mutations that prevent the activation of 7TM receptors are 
responsible for states of hormone resistance which are responsible for diseases such as 
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congenital nephrogenic diabetes insipidus. See Rosenthal et aL, J. Biol Chem., 
268:13030-13033 (1993). Still other studies have shown that several 7TM receptors can 
function as protooncogenes and be activated by mutational alteration. See, for example, 
Allen et aL, Proc. Natl Acad. Sci. USA, 88:11354-11358 (1991) which suggests that 
spontaneously occurring mutations in some 7TM receptors may alter the normal function 
of the receptors and result in uncontrolled cell growth associated with human disease 
states such as neoplasia and atherosclerosis. Therefore, mutations in 7TM receptors may 
underlie a number of human pathologies. 

Kinases- 

[0008] The kinases comprise the largest known group of proteins, a superfarnily of 
enzymes with widely varied functions and specificities. Kinases regulate many different 
cell proliferation, differentiation, and signaling processes by adding phosphate groups to 
proteins. Receptor mediated extracellular events trigger the transfer of these high energy 
phosphate groups and activate intracellular signaling cascades. Activation is roughly 
analogous to the turning on a molecular switch, and in cases where signalling is 
uncontrolled, may be associated with or produce inflammation and cancer. 
[0009] Almost all kinases contain a similar 250-300 amino acid catalytic domain. The 
N-terminal domain, which contains subdomains I-IV, generally folds into a two-lobed 
structure which binds and orients the ATP (or GTP) donor molecule. The larger C 
terminal lobe, which contains subdomains VIA-XI, binds the protein substrate and carries 
out the transfer of the gamma phosphate from ATP to the hydroxyl group of a serine, 
threonine, or tyrosine residue. Subdomain V spans the two lobes. 
[0010] The kinases may be categorized into families by the different amino acid 
sequences (between 5 and 100 residues) located on either side of, or inserted into loops of, 
the kinase domain. These amino acid sequences allow the regulation of each kinase as it 
recognizes and interacts with its target protein. The primary structure of the kinase domain 
is conserved and contains specific residues and identifiable motifs or patterns of amino 
acids. The serine threonine kinases represent one family which preferentially 
phosphorylates serine or threonine residues. Many serine threonine kinases, including 
those from human, rabbit, rat, mouse, and chicken cells and tissues, have been described 
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(Hardie, G. and Hanks, S. (1995) The Protein Kinase Facts Books, Vol 1:7-20 Academic 
Press, San Diego, CA). 

Matrix Proteins- 

[0011] The matrix proteins (MPs) provide structural support, cell and tissue identity, 
and autocrine, paracrine and juxtacrine properties for most eukaryotic cells (McGowan, 
S.E. (1992) FASEB J. 6:2895-2904). MPs include adhesion molecules, integrins and 
selecting cadherins, lectins, lipocalins, and extracellular matrix proteins (ECMs). MPs 
possess many different domains which interact with soluble, extracellular molecules. 
These domains include collagen-like domains, EGF-like domains, immunoglobulin-like 
domains, fibronectin-like domains, type A domain of von Willebrand factor (vWFA)-like 
modules, ankyrin repeat modules, RDG or RDG-like sequences, carbohydrate-binding 
domains, and calcium-binding domains. 

[0012] The diversity, distribution and biochemistry of MPs is indicative of their many, 
overlapping roles in cell proliferation and cell signaling* MPs function in the formation, 
growth, remodeling, and maintenance of bone, and in the mediation and regulation of 
inflammation. Biochemical changes that result from congenital, epigenetic, or infectious 
diseases affect the expression and balance of MPs. This balance, in turn, affects the 
activation, proliferation, differentiation, and migration of leukocytes and determines 
whether the immune response is appropriate or self-destructive (Roman, J. (1996) 
Immunol. Res. 15:163-178). 

Fibronectins- 

[0013] Fibronectin proteins play a vital role in the structure and function of the 
extracellular matrix (ECM)* Defects in the function of the ECM are thought to be involved 
in diseases such as osteoporosis, atherosclerosis, arthritis, and fibrotic diseases. 
Fibronectin enables cells to adhere to the ECM, and influences the growth and migration 
of cells as well as the organization of the cytoskeleton. As a major component of the 
ECM, Fibronectin is thought to influence such processes as cellular adhesion and 
migration, particularly during development, as well as processes such as wound repair 
(R.O. Hynes, PNAS, 96:2588-90 (1999)). 
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[0014] Fibronectin is a disulfide-linked dimeric glycoprotein composed of type I, type 
n, and type EI fibronectin repeats. Type I repeats are approximately 45 amino acids in 
length and are located at the amino- and carboxy-termini of the protein. Type II domains 
are approximately 40-60 amino acids in length, and contain four conserved cysteines 
involved in disulfide bonding. It is thought that the type II domains may function in 
collagen binding. There are approximately 15-17 type m domains, arranged in tandem in 
the middle of the protein, that are thought to provide elasticity to fibronectin. 

Defensins- 

[0015] Mammalian defensins are produced by the epidermis and mucosal epithelium as 
innate effector molecules thought to function in an antimicrobial capacity. Defensins are 
cytotoxic peptides with a broad range of activity on gram-positive and negative bacteria, 
fungi, parasites, viruses, and mycobacteria. The two characterized defensins are the alpha 
and beta defensins. The alpha-defensins are produced by neutrophils and macrophage, 
while the beta-defensins are produced by epithelia (Singh, P.K., et al., PNAS, 95:14961-66 
(1998); Lillard, J.W., et al, PNAS 9 96:651-56 (1999)). 

[0016] Defensin peptides range in length from approximately 29 to 35 amino acids, 
and include six conserved cysteine residues involved in disulfide bond formation and 
protein folding. The distribution and connection of the cysteine residues differs between 
the alpha and beta defensins. 

EF-hand domain containing proteins- 

[0017] Calcium is well known to be essential for cell signaling. However, calcium also 
plays a role in such cellular processes as protein processing and membrane traffic to and 
through the Oolgi. Many proteins thought to be involved in the binding of calcium 
accomplish this in part through a protein calcium-binding domain known as the EF-hand 
domain- 

[0018] The domain consists of a twelve residue loop flanked by a twelve residue alpha- 
helical domain on both sides. In the EF hand loop, the calcium ion is situated in a 
coordinated pentagonal bipyramidal configuration. An invariant Glutamic acid or Aspartic 
acid residue provides two oxygens for liganding the calcium ion. 
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[0019] Proteins containing this domain include aequorin and Renilla luciferin binding 
protein (LBP), Recoverins, Calmodulin, Calpain small and large chains, Calretinin, 
Calcyclin, Fimbrin, Serine/Threonine protein phosphatase, and Diacylglyceroi kinase, for 
example* 

MAC/Perforin Family Members- 

[0020] The Membrane Attack Complex (MAC) is one of the sequentially activated, 
membrane bound complexes of the complement system used to eliminate diseased or non- 
compliant cells. Under this system, activated C5b sequentially binds C6 and C7, which 
insert into cell membranes. This complex then binds one molecule of C8, followed by 
between 1 and 18 molecules of C9, which polymerizes to generate a transmembrane 
channel. These transmembrane channels pierce the membrane, increasing die cell's 
permeability. These channels permit small molecules in the cell to exchange with the 
medium. Therefore, water is osmotically drawn into the cell, eventually resulting in the 
cell bursting. 

[0021] Similarly, Perforin is a molecule produced by cytotoxic T cells. In the presence 
of calcium, Perforin polymerizes into transmembrane channels capable of lysing a variety 
of target cells in a nonspecific manner* 

Pancreatic Hormones* Serine Carboxypeptidases- 

[0022] Pancreatic hormone (PP) is a peptide of approximately 80 amino acids in length 
that is generated in pancreatic islets of Langherhans and consequently secreted. Pancreatic 
hormone is thought to function as a regulator of pancreatic and gastrointestinal functions. 
[0023] Representative members of the pancreatic hormones family of proteins include 
Neuropeptide Y, Peptide YY, and skin peptide YY. These proteins may be useful as 
therapeutics for controlling secretion of the gonadotropin-releasing hormone, disorders 
related to feeding, vasoconstrictory actions, and colonic mobility, as well as antibacterial 
and antifungal activity. 

Serine Carboxypeptidases- 
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[0024] Carboxypeptidases catalyze the hydrolysis of C-terminal residues of 
polypeptides. Carboxypeptidases are identified either as metallo-carboxypeptidases or 
serine-carboxypeptidases. 

[0025] Serine carboxypeptidases have the ability to hydrolyze peptides as well as 
peptide amides from the C-terminus, and have a preferential release of a C-terminal 
arginine or lysine residue. Their subcellular location is usually extracellular or 
intracellular. The catalytic activity of serine carboxypeptidases is provided by a charge 
relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which is 
itself hydrogen bonded to a serine. 

Tumor necrosis factors (TNiy 

[0026] Tumor necrosis factors (TNF) alpha and beta are cytokines, which act 

through TNF receptors to regulate numerous biological processes, including protection 
against infection and induction of shock and inflammatory disease. The TNF molecules 
belong to the "TNF-ligand" superfamily, and act together with their receptors or counter- 
ligands, the "TNF-receptor" superfamily. So far, nine members of the TNF ligand 
superfamily have been identified and ten members of the TNF-receptor superfamily have 
been characterized 

[0027] Many members of the TNF-ligand superfamily are expressed by activated T- 
cells, implying that they are necessary for T-cell interactions with other cell types which 
underlie cell ontogeny and functions (Meager, A., supra). 

[0028] Considerable insight into the essential functions of several members of the TNF 
receptor family has been gained from the identification and creation of mutants that 
abolish the expression of these proteins. For example, naturally occurring mutations in the 
FAS antigen and its ligand cause lymphoproliferative disease (Watanabe-Fukunaga, R. et 
ah, Nature 356:314 (1992)), perhaps reflecting a failure of programmed cell death. 
Mutations of the CD40 ligand cause an X-linked immunodeficiency state characterized by 
high levels of immunoglobulin M and low levels of immunoglobulin G in plasma, 
indicating faulty T-cell-dependent B-cell activation (Allen, R.C. et al> Science 259:990 
(1993)). Targeted mutations of the low affinity nerve growth factor receptor cause a 
disorder characterized by faulty sensory innovation of peripheral structures (Lee, KF. et 
al, Celt 69:737 (1992)). 
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[0029] TNF and LT-a are capable of binding to two TNF receptors (the 55- and 75-kd 
TNF receptors). A large number of biological effects elicited by TNF and LT-a, acting 
through their receptors, include hemorrhagic necrosis of transplanted tumors, cytotoxicity, 
a role in endotoxic shock, inflammation, immunoregulation, proliferation and anti-viral 
responses, as well as protection against the deleterious effects of ionizing radiation. TNF 
and LT-a are involved in the pathogenesis of a wide range of diseases, including 
endotoxic shock, cerebral malaria, tumors, autoimmune disease, AIDS and graft-host 
rejection (Beutler, B. and Von Huffel, C, Science 264:661-66% (1994)). Mutations in the 
p55 Receptor cause increased susceptibility to microbial infection. 
[0030] Moreover, an about 80 amino acid domain near the C-terminus of TNFR1 (p55) 
and Fas was reported as the "death domain," which is responsible for transducing signals 
for programmed cell death (Tartaglia et aL t Cell 74:845 (1993)). 

[0031] Plasma membrane associated proteins with a predominant tissue expression 
pattern are important targets for targeted drug delivery, tumor-targeted therapy (e.g., 
including, but not limited to, radioimmunotherapy) antibody mediated attack of diseased 
tissues or cancers, and immune mediated cytotoxicity. 

[0032] The discovery of new plasma membrane associated proteins and the 
polynucleotides encoding these molecules thus satisfies a need in the art by not only 
providing new compositions useful in the diagnosis, treatment, and prevention of diseases 
associated with cell proliferation and cell signaling, particularly cancer, immune response 
and neuronal disorders; but also by providing new targets for immune based therapies. 

Summary of the Invention 
[0033] The present invention relates to novel proteins. More specifically, isolated 
nucleic acid molecules are provided encoding novel polypeptides. Novel polypeptides and 
antibodies that bind to these polypeptides are provided Also provided are vectors, host 
cells, and recombinant and synthetic methods for producing human polynucleotides and/or 
polypeptides, and antibodies. The invention further relates to diagnostic and therapeutic 
methods useful for diagnosing, treating, preventing and/or prognosing disorders related to 
these novel polypeptides. The invention further relates to screening methods for 
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identifying agonists and antagonists of polynucleotides and polypeptides of the invention. 
The present invention further relates to methods and/or compositions for inhibiting or 
enhancing the production and function of the polypeptides of the present invention. 

Detailed Description 

Tables 

[0034] Table 1 summarizes some of the polynucleotides encompassed by the invention 

(including cDNA clones related to the sequences (Clone ID NO:Z), contig sequences 

(contig identifier (Contig ID:) and contig nucleotide sequence identifier (SEQ ID NO:X)) 

and further summarizes certain characteristics of these polynucleotides and the 

polypeptides encoded thereby. The first column provides the gene number in the 

application for each clone identifier. The second column provides a unique clone 

identifier, "Clone ID NO:Z", for a cDNA clone related to each contig sequence disclosed 

in Table 1. The third column provides a unique contig identifier, "Contig ID:" for each of 

the contig sequences disclosed in Table 1. The fourth column provides the sequence 

identifier, "SEQ ID NO:X", for each of the contig sequences disclosed in Table 1. The 

fifth column, "ORF (From-To)", provides the location (i.e., nucleotide position numbers) 

within the polynucleotide sequence of SEQ ID NO:X that delineate the preferred open 

reading frame (ORF) that encodes the amino acid sequence shown in the sequence listing 

and referenced in Table 1 as SEQ ID NO:Y (column 6). Column 7 lists residues 

comprising predicted epitopes contained in the polypeptides encoded by each of the 

preferred ORFs (SEQ ID NO:Y). Identification of potential immunogenic regions was 

performed according to the method of Jameson and Wolf (CABIOS, 4; 181-186 (1988)); 

specifically, the Genetics Computer Group (GCG) implementation of this algorithm, 

embodied in the program PEPTTOESTRUCTURE (Wisconsin Package vlO.0, Genetics 

Computer Group (GCG), Madison, Wise). This method returns a measure of the 

probability that a given residue is found on the surface of the protein. Regions where the 

antigenic index score is greater than 0.9 over at least 6 amino acids are indicated in Table 

1 as "Predicted Epitopes". In particular embodiments, polypeptides of the invention 

comprise, or alternatively consist o£ one, two, three, four, five or more of the predicted 

epitopes described in Table 1. It will be appreciated that depending on the analytical 

criteria used to predict antigenic determinants, the exact address of the determinant may 
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vary slightly. Column 8, 'Tissue Distribution" shows the expression profile of tissue, 
cells, and/or cell line libraries which express the polynucleotides of the invention. The 
first number in column 8 (preceding the colon), represents the tissue/cell source identifier 
code corresponding to the key provided in Table 4. Expression of these polynucleotides 
was not observed in the other tissues and/or cell libraries tested. For those identifier codes 
in which the first two letters are not "AR", the second number in column 8 (following the 
colon), represents the number of times a sequence corresponding to the reference 
polynucleotide sequence (e.g., SEQ ID NO:X) was identified in the tissue/cell source. 
Those tissue/cell source identifier codes in which the first two letters are "AR" designate 
information generated using DNA array technology. Utilizing this technology, cDNAs 
were amplified by PGR and then transferred, in duplicate, onto the array. Gene expression 
was assayed through hybridization of first strand cDNA probes to the DNA array. cDNA 
probes were generated from total RNA extracted from a variety of different tissues and 
cell lines. Probe synthesis was performed in the presence of 33 P dCTP, using oligo(dT) to 
prime reverse transcription. After hybridization, high stringency washing conditions were 
employed to remove non-specific hybrids from the array. The remaining signal, emanating 
from each gene target, was measured using a Phosphorimager. Gene expression was 
reported as Phosphor Stimulating Luminescence (PSL) which reflects the level of 
t phosphor signal generated from the probe hybridized to each of the gene targets 
represented on the array. A local background signal subtraction was performed before the 
total signal generated from each array was used to normalize gene expression between the 
different hybridizations. The value presented after "[array code]:" represents the mean of 
the duplicate values, following background subtraction and probe normalization. One of 
skill in the art could routinely use this information to identify normal and/or diseased 
tissue(s) whiph show a predominant expression pattern of the corresponding 
polynucleotide of the invention or to identify polynucleotides which show predominant 
and/or specific tissue and/or cell expression. Column 9 provides the chromosomal 
location of polynucleotides corresponding to SEQ ID NO:X. Chromosomal location was 
determined by finding exact matches to EST and cDNA sequences contained in the NCBI 
(National Crater for Biotechnology Information) UniGene database. Given a presumptive 
chromosomal location, disease locus association was determined by comparison with the 
Morbid Map, derived from Online Mendelian Inheritance in Man (Online Mendelian 
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Inheritance in Man, OMM™. McKusick-Nathans Institute for Genetic Medicine, Johns 
Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, 
National Library of Medicine (Bethesda, MD) 2000. World Wide Web URL: 
http://www.ncbijilm.nih.gov/omim/). If the putative chromosomal location of the Query 
overlaps with the chromosomal location of a Morbid Map entry, an OMIM identification 
number is disclosed in column 10 labeled "OMIM Disease Reference^)". A key to the 
OMIM reference identification numbers is provided in Table 5. Column 11 provides the 
amino acid position of the ALOM hit(s) predicted for the amino acid sequence shown in 
SEQ ID NO:Y. 

[0035] Table 2 summarizes homology and features of some of the polypeptides of the 
invention. The first column provides a unique clone identifier, "Clone ID NO:Z", 
corresponding to a cDNA clone disclosed in Table 1. The second column provides the 
unique contig identifier, "Contig ID:" corresponding to contigs in Table 1 and allowing 
for correlation with the information in Table 1. The third column provides the sequence 
identifier, "SEQ ID NO:X", for the contig polynucleotide sequence. The fourth column 
provides the analysis method by which the homology/identity disclosed in die Table was 
determined. Comparisons were made between polypeptides encoded by the 
polynucleotides of the invention and either a non-redundant protein database (herein 
referred to as "NR"), or a database of protein families (herein referred to as "PFAM") as 
further described below. The fifth column provides a description of the PFAM/NR hit 
having a significant match to a polypeptide of the invention. Column six provides the 
accession number of the PFAM/NR hit disclosed in the fifth column. Column seven, 
"Score/Percent Identity", provides a quality score or the percent identity, of the hit 
disclosed in columns five and six. Columns 8 and 9, "NT From" and "NT To" 
respectively, delineate the polynucleotides in "SEQ ID NO:X" that encode a polypeptide 
having a significant match to the PFAM/NR database as disclosed in the fifth and sixth 
columns. In specific embodiments polypeptides of the invention comprise, or 
alternatively consist of, an amino acid sequence encoded by a polynucleotide in SEQ ID 
NO:X as delineated in columns 8 and 9, or fragments or variants thereof 
[0036] Table 3 provides polynucleotide sequences that may be disclaimed according to 
certain embodiments of the invention. The first column provides a unique clone identifier, 
"Clone ID", for a cDNA clone related to contig sequences disclosed in Table 1. The 
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second column provides the sequence identifier, "SEQ ID NO:X", for contig sequences 
disclosed in Table I. The third column provides the unique contig identifier, "Contig 
ID:", for contigs disclosed in Table 1. The fourth column provides a unique integer V 
where 'a' is any integer between 1 and the final nucleotide minus 15 of SEQ ID NO:X, 
and the fifth column provides a unique integer V where i b > is any integer between 15 and 
the final nucleotide of SEQ ID NO:X, where both a and b correspond to the positions of 
nucleotide residues shown in SEQ ID NO:X, and where b is greater than or equal to a + 
14. For each of the polynucleotides shown as SEQ ID NO:X, the uniquely defined integers 
can be substituted into the general formula of a-b, and used to describe polynucleotides 
which may be preferably excluded from the invention. In certain embodiments, preferably 
excluded from the invention are at least one, two, three, four, five, ten, or more of the 
polynucleotide sequence(s) having the accession numbers) disclosed in the sixth column 
of this Table. In further embodiments, preferably excluded from the invention are the 
specific polynucleotide sequence(s) contained in the clones corresponding to at least one, 
two, three, four, five, ten, or more of the available material having the accession numbers 
identified in the sixth column of this Table. 

[0037] Table 4 provides a key to the tissue/cell source identifier code disclosed in 
Table 1, column 8. Column 1 provides the tissue/cell source identifier code disclosed in 
Table 1, Column 8. Columns 2-5 provide a description of the tissue or cell source. Codes 
corresponding to diseased tissues are indicated in column 6 with the word "disease". The 
use of the word "disease" in column 6 is non-limiting. The tissue or cell source may be 
specific (e.g. a neoplasm), or may be disease-associated (e.g., a tissue sample from a 
normal portion of a diseased organ). Furthermore, tissues and/or cells lacking the 
"disease'* designation may still be derived from sources directly or indirectly involved in a 
disease state or disorder, and therefore may have a further utility in that disease state or 
disorder. In numerous cases where the tissue/cell source is a library, column 7 identifies 
the vector used to generate the library. 

[0038] Table 5 provides a key to the OMEM reference identification numbers disclosed 
in Table 1, column 10. OMIM reference identification numbers (Column 1) were derived 
from Online Mendelian Inheritance in Man (Online Mendelian Inheritance in Man, 
OMIM. McKusick-Natbans Institute for Genetic Medicine, Johns Hopkins University 
(Baltimore, MD) and National Center for Biotechnology Information, National Library of 
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Medicine, (Bethesda, MD) 2000. World Wide Web URL: 
ht^:/Avww.ncbi.idm.nih.gov/omim/). Column 2 provides diseases associated with the 
cytologic band disclosed in Table 1, column 9, as determined using the Morbid Map 
database. 
[0039] 


Definitions 

[0040] The following definitions are provided to facilitate understanding of certain 
terms used throughout this specification. 

[0041] In the present invention, "isolated" refers to material removed from its original 
environment (e.g., the natural environment if it is naturally occurring), and thus is altered 
"by the hand of man" from its natural state. For example, an isolated polynucleotide could 
be part of a vector or a composition of matter, or could be contained within a cell, and still 
be "isolated" because that vector, composition of matter, or particular cell is not the 
original environment of the polynucleotide. The term "isolated" does not refer to genomic 
or cDNA libraries, whole cell total or mKNA preparations, genomic DNA preparations 
(including those separated by electrophoresis and transferred onto blots), sheared whole 
cell genomic DNA preparations or other compositions where the art demonstrates no 
distinguishing features of the polynucleotide/sequences of the present invention* 
[0042] As used herein, a "polynucleotide" refers to a molecule having a nucleic acid 
sequence encoding SEQ ID NO:Y or a fragment or variant thereof; a nucleic acid 
sequence contained in SEQ ID NO:X (as described in column 3 of Table 1) or the 
complement thereof; a cDNA sequence contained in Clone ID NO:Z (as described in 
column 2 of Table 1 and contained within the ATCC Deposit). For example, the 
polynucleotide can contain the nucleotide sequence of the full length cDNA sequence, 
including the 5 1 and 3' untranslated sequences, the coding region, as well as fragments, 
epitopes, domains, and variants of the nucleic acid sequence. Moreover, as used herein, a 
"polypeptide" refers to a molecule having an amino acid sequence encoded by a 
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polynucleotide of the invention as broadly defined (obviously excluding poly- 
Phenylalanine or poly-Lysine peptide sequences which result from translation of a polyA 
tail of a sequence corresponding to a cDNA). 

[0043] In the present invention, "SEQ ID NO:X" was often generated by overlapping 
sequences contained in multiple clones (contig analysis). A representative clone 
containing all or most of the sequence for SEQ ID NO:X is deposited at Human Genome 
Sciences, Inc. (HGS) in a catalogued and archived library. As shown, for example, in 
column 2 of Table 1, each clone is identified by a cDNA Clone ID (identifier generally 
referred to herein as Clone ID NO:Z). Each Clone ID is unique to an individual clone and 
the Clone ED is all the information needed to retrieve a given clone from the HGS library. 
Furthermore, clones disclosed in this application have been deposited with the ATCC on 
March 24, 2000, having the ATCC designation numbers PTA-1559. The ATCC is located 
at 10801 University Boulevard, Manassas, Virginia 20110-2209, USA. The ATCC 
deposits were made pursuant to the terms of the Budapest Treaty on the international 
recognition of the deposit of microorganisms for the purposes of patent procedure. 
[0044] In specific embodiments, die polynucleotides of the invention are at least IS, at 
least 30, at least 50, at least 100, at least 125, at least 500, or at least 1000 continuous 
nucleotides but are less than or equal to 300 kb, 200 kb, 100 kb, 50 kb, 15 kb, 10 kb, 
7.5kb, 5 kb, 2.5 kb, 2.0 kb, or 1 kb, in length. In a further embodiment, polynucleotides of 
the invention comprise a portion of the coding sequences, as disclosed herein, but do not 
comprise all or a portion of any intron. In another embodiment, the polynucleotides 
comprising coding sequences do not contain coding sequences of a genomic flflnlring gene 
(i.e., 5* or 3* to the gene of interest in the genome). In other embodiments, the 
polynucleotides of the invention do not contain the coding sequence of more than 1000, 
500, 250, 100, 50, 25, 20, 15, 10, 5, 4, 3, 2, or 1 genomic flanking gene(s). 
[0045] A "polynucleotide" of the present invention also includes those polynucleotides 
capable of hybridizing, under stringent hybridization conditions, to sequences contained in 
SEQ ID NO:X, or the complement thereof (e.g., the complement of any one, two, three, 
four, or more of the polynucleotide fragments described herein), the polynucleotide 
sequence delineated in columns 8 and 9 of Table 2 or die complement thereof and/or 
cDNA sequences contained in Clone ID NO:Z (e.g., the complement of any one, two, 
three, four, or more of the polynucleotide fragments, or the cDNA clone within the pool of 
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cDNA clones deposited with the ATCC, described herein). "Stringent hybridization 
conditions" refers to an overnight incubation at 42 degree C in a solution comprising 50% 
fonnamide, 5x SSC (750 mM NaCl, 75 mM trisodium citrate), 50 mM sodium phosphate 
(pH 7.6), 5x Denhardt's solution, 10% dextran sulfate, and 20 jig/ml denatured, sheared 
salmon sperm DNA, followed by washing the filters in O.lx SSC at about 65 degree C. 
[0046] Also contemplated are nucleic acid molecules that hybridize to the 
polynucleotides of the present invention at lower stringency hybridization conditions. 
Changes in the stringency of hybridization and signal detection are primarily 
accomplished through the manipulation of fonnamide concentration (lower percentages of 
fonnamide result in lowered stringency); salt conditions, or temperature. For example, 
lower stringency conditions include an overnight incubation at 37 degree C in a solution 
comprising 6X SSPE (20X SSPE = 3M NaCl; 0.2M NaHiPO^ 0.02M EDTA, pH 7.4), 
0.5% SDS, 30% formamide, 100 ug/ml salmon sperm blocking DNA; followed by washes 
at 50 degree C with 1XSSPE, 0.1% SDS. In addition, to achieve even lower stringency, 
washes performed following stringent hybridization can be done at higher salt 
concentrations (e.g. 5X SSC). 

[0047] Note that variations in the above conditions may be accomplished through die 
inclusion and/or substitution of alternate blocking reagents used to suppress background in 
hybridization experiments. Typical blocking reagents include Denhardt's reagent, 
BLOTTO, heparin, denatured salmon sperm DNA, and commercially available 
proprietary formulations. The inclusion of specific blocking reagents may require 
modification of the hybridization conditions described above, due to problems with 
compatibility. 

[0048] Of course, a polynucleotide which hybridizes only to polyA+ sequences (such 
as any 3* terminal polyA+ tract of a cDNA shown in the sequence listing), or to a 
complementary stretch of T (or U) residues, would not be included in the definition of 
"polynucleotide," since such a polynucleotide would hybridize to any nucleic acid 
molecule containing a poly (A) stretch or the complement thereof (e.g., practically any 
double-stranded cDNA clone generated using oligo dT as a primer). 
[0049] The polynucleotide of the present invention can be composed of any 
polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or 
modified RNA or DNA. For example, polynucleotides can be composed of single- and 

16 


WO 01/90304 


PCT/US01/16450 


double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, 
single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded 
regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, 
more typically, double-stranded or a mixture of single- and double-stranded regions. In 
addition, the polynucleotide can be composed of triple-stranded regions comprising RNA 
or DNA or both RNA and DNA. A polynucleotide may also contain one or more 
modified bases or DNA or RNA backbones modified for stability or for other reasons. 
"Modified" bases include, for example, tritylated bases and unusual bases such as inosine. 
A variety of modifications can be made to DNA and RNA; thus, "polynucleotide" 
embraces chemically, enzymatically, or metabolically modified forms. 
[0050] The polypeptide of the present invention can be composed of amino acids 
joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres, 
and may contain amino acids other than the 20 gene-encoded amino acids. The 
polypeptides may be modified by either natural processes, such as posttranslational 
processing, or by chemical modification techniques which are well known in the art Such 
modifications are well described in basic texts and in more detailed monographs, as well 
as in a voluminous research literature. Modifications can occur anywhere in a 
polypeptide, including the peptide backbone, the amino acid side-chains and the amino or 
carboxyl termini. It will be appreciated that the same type of modification may be present 
in the same or varying degrees at several sites in a given polypeptide. Also, a given 
polypeptide may contain many types of modifications. Polypeptides may be branched, for 
example, as a result of ubiquitination, and they may be cyclic, with or without branching. 
Cyclic, branched, and branched cyclic polypeptides may result from posttranslation 
natural processes or may be made by synthetic methods. Modifications include 
acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, 
covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide 
derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of 
phosphotidylinositol, cross-linMng, cyclization, disulfide bond formation, demethylation, 
formation of covalent cross-links, formation of cysteine, formation of pyroglutamate, 
formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, 
iodination, methylation, myristoylation, oxidation, pegylation, proteolytic processing, 
phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA 
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mediated addition of amino acids to proteins such as arginylation, and ubiquitinatioa 
(See, for instance, PROTEINS - STRUCTURE AND MOLECULAR PROPERTIES, 2nd 
Ed., T. E. Creighton, W, H. Freeman and Company, New York (1993); 
POSTTRANSLATIONAL COVALENT MODIFICATION OF PROTEINS, B. C. 
Johnson, Ed, Academic Press, New York, pgs. 1-12 (1983); Seifter et aL, Meth. Enzymol. 
182:626-646 (1990); Rattan et al., Ann. N.Y. Acad. Sci. 663:48-62 (1992)). 
[00511 "SEQ ID NO:X" refers to a polynucleotide sequence described, for example, in 
Tables 1 or 2, while "SEQ ID NO.Y" refers to a polypeptide sequence described in 
column 6 of Table 1. SEQ ID NO:X is identified by an integer specified in column 4 of 
Table 1. The polypeptide sequence SEQ ID NO:Y is a translated open reading frame 
XORF) encoded by polynucleotide SEQ ID NO:X. "Clone ID NO:Z" refers to a cDNA 
clone described in column 2 of Table 1, 

[0052] "A polypeptide having functional activity" refers to a polypeptide capable of 
displaying one or more known functional activities associated with a full-length 
(complete) protein. Such functional activities inchxdfe, but are not limited to, biological 
activity, antigenicity [ability to bind (or compete with a polypeptide for binding) to an 
anti-polypeptide antibody], immunogenicity (ability to generate antibody which binds to a 
specific polypeptide of the invention), ability to form multimers with polypeptides of the 
invention, and ability to bind to a receptor or ligand for a polypeptide. 
[0053] The polypeptides of the invention can be assayed for functional activity (e.g. 
biological activity) using or routinely modifying assays known in the art, as well as assays 
described herein. Specifically, one of skill in the art may routinely assay polypeptides 
(including fragments and variants) of the invention for activity using assays as described 
in the Examples. 

[0054J "A polypeptide having biological activity" refers to a polypeptide exhibiting 
activity similar to, but not necessarily identical to, an activity of a polypeptide of the 
present invention, including mature forms, as measured in a particular biological assay, 
with or without dose dependency. In the case where dose dependency does exist, it need 
not be identical to that of the polypeptide, but rather substantially similar to the dose- 
dependence in a given activity as compared to the polypeptide of the present invention 
(Le., the candidate polypeptide will exhibit greater activity or not more than about 25-fold 
less and, preferably, not more than about tenfold less activity, and most preferably, not 
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more than .bo* three-fold 1- -WW 1*» * *• ^ tide of fte »— » 

invention). . ^ 

(including contig sequences (SEQ ID NO*) and clones (Clone ID NO:Z) and further 
.ommartoce^cban^eristo of to.epo^udeoSto and tapolyp^to wooded 

thereby. 


It has been discovered herein that the polynucleotides described m Table 1 are 
^tobelocali^totheplas^rnembraneof hurnancells. Accordingly, such 
polynucleotides, polypeptides encoded by such polynucleotides, and anhbod.es 
for such polypeptides find use in the diagnosis, treating and prevention of diseases 
associatedwithceUprotiferationandceUsi^ 

and neuronal disorders. 

PlMm a mcml^ne location ™ p«dicted using ft. followng method. All 
novel contigs in ft. HOS dateo*. were -red using ft. ALOM prog«m developed by 
Klein et al. to detect potentW transmembrcne segment. (Klein, P. et 1. Brochun. 
Biophys. Ac* 815:468 (1985); which is hereby incepted by reference to its enhre* 
i^in). ALOM attempts to identify the most probable transmembrane segment ta. ft. 
average hydrophobia* value of 17-residue segme-n*, if any. I. predU*, whether ft. 
^ „ a .r^smembrane segment ONTEGaAL) or not (PERIPHERAL) coning 
». dtocriminan. score (reported as VataeD 1ft a threshold par^eter pre- d^rf te 0.0 
for bacteria Ctasbokf). For . integral membrane protein, posmon(s) of tnmsmembrane 
segme*(s) are also reported. Their tengm is fixed ft 17 but ft* extension, U. ft. 

discrimination step mooned above is continued after leaving out fte segr^ till tb^e 
^no predicted tr^entae segm^T1« item Wis to number of predrcted 

transmembrane segments. 

The protein sentence used was the longest start-codon to sto,«odo» (or end of 
sequence) ORF. If the ORF was at least 100 amino acids long, and there was a predicted 
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Met, the contig was selected as encoding a plasma-membrane-associated protein. The 
polynucleotides of the invention are predicted to be plasma membrane associated and 
comprise the predicted INTEGRAL membrane domains for each unique contig ID shown 
in column 11 of Table 1. 
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L0776: 1, L0379: 1, L0657: 
1.L0809: 1.L0519: 1, 
L0791: 1.L0663: 1.H0684: 
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1, L0804: 1, L0774: 1, 
H0547: 1, H0519: 1, H0435: 
1,S0380:I,L0759:1, 
H0445: 1, H0542: 1 and 
H0506: 1. 
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AR039: 10.AR104: 6 
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AR089: 3, AR060: 3, 
AR053: 2.AR096: 2, 
AR055: 1.AR039: 1, 
AR104: 1.AR061: 1 
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L0749: 5, S0410: 4, S0422: 
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Leu-26toGly-33, 
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Glu-75 to Lys-80. 
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Thr-90toAsp-100, 
Leu-108 to Ala-120. 
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